220 research outputs found

    Free energy Sequential Monte Carlo, application to mixture modelling

    Full text link
    We introduce a new class of Sequential Monte Carlo (SMC) methods, which we call free energy SMC. This class is inspired by free energy methods, which originate from Physics, and where one samples from a biased distribution such that a given function ξ(θ)\xi(\theta) of the state θ\theta is forced to be uniformly distributed over a given interval. From an initial sequence of distributions (πt)(\pi_t) of interest, and a particular choice of ξ(θ)\xi(\theta), a free energy SMC sampler computes sequentially a sequence of biased distributions (π~t)(\tilde{\pi}_{t}) with the following properties: (a) the marginal distribution of ξ(θ)\xi(\theta) with respect to π~t\tilde{\pi}_{t} is approximatively uniform over a specified interval, and (b) π~t\tilde{\pi}_{t} and πt\pi_{t} have the same conditional distribution with respect to ξ\xi. We apply our methodology to mixture posterior distributions, which are highly multimodal. In the mixture context, forcing certain hyper-parameters to higher values greatly faciliates mode swapping, and makes it possible to recover a symetric output. We illustrate our approach with univariate and bivariate Gaussian mixtures and two real-world datasets.Comment: presented at "Bayesian Statistics 9" (Valencia meetings, 4-8 June 2010, Benidorm

    The Poisson transform for unnormalised statistical models

    Full text link
    Contrary to standard statistical models, unnormalised statistical models only specify the likelihood function up to a constant. While such models are natural and popular, the lack of normalisation makes inference much more difficult. Here we show that inferring the parameters of a unnormalised model on a space Ω\Omega can be mapped onto an equivalent problem of estimating the intensity of a Poisson point process on Ω\Omega. The unnormalised statistical model now specifies an intensity function that does not need to be normalised. Effectively, the normalisation constant may now be inferred as just another parameter, at no loss of information. The result can be extended to cover non-IID models, which includes for example unnormalised models for sequences of graphs (dynamical graphs), or for sequences of binary vectors. As a consequence, we prove that unnormalised parameteric inference in non-IID models can be turned into a semi-parametric estimation problem. Moreover, we show that the noise-contrastive divergence of Gutmann & Hyv\"arinen (2012) can be understood as an approximation of the Poisson transform, and extended to non-IID settings. We use our results to fit spatial Markov chain models of eye movements, where the Poisson transform allows us to turn a highly non-standard model into vanilla semi-parametric logistic regression

    Application of Sequential Quasi-Monte Carlo to Autonomous Positioning

    Full text link
    Sequential Monte Carlo algorithms (also known as particle filters) are popular methods to approximate filtering (and related) distributions of state-space models. However, they converge at the slow 1/N1/\sqrt{N} rate, which may be an issue in real-time data-intensive scenarios. We give a brief outline of SQMC (Sequential Quasi-Monte Carlo), a variant of SMC based on low-discrepancy point sets proposed by Gerber and Chopin (2015), which converges at a faster rate, and we illustrate the greater performance of SQMC on autonomous positioning problems.Comment: 5 pages, 4 figure

    Divide and conquer in ABC: Expectation-Progagation algorithms for likelihood-free inference

    Full text link
    ABC algorithms are notoriously expensive in computing time, as they require simulating many complete artificial datasets from the model. We advocate in this paper a "divide and conquer" approach to ABC, where we split the likelihood into n factors, and combine in some way n "local" ABC approximations of each factor. This has two advantages: (a) such an approach is typically much faster than standard ABC and (b) it makes it possible to use local summary statistics (i.e. summary statistics that depend only on the data-points that correspond to a single factor), rather than global summary statistics (that depend on the complete dataset). This greatly alleviates the bias introduced by summary statistics, and even removes it entirely in situations where local summary statistics are simply the identity function. We focus on EP (Expectation-Propagation), a convenient and powerful way to combine n local approximations into a global approximation. Compared to the EP- ABC approach of Barthelm\'e and Chopin (2014), we present two variations, one based on the parallel EP algorithm of Cseke and Heskes (2011), which has the advantage of being implementable on a parallel architecture, and one version which bridges the gap between standard EP and parallel EP. We illustrate our approach with an expensive application of ABC, namely inference on spatial extremes.Comment: To appear in the forthcoming Handbook of Approximate Bayesian Computation (ABC), edited by S. Sisson, L. Fan, and M. Beaumon

    Bayesian nonparametric estimation of the spectral density of a long or intermediate memory Gaussian process

    Full text link
    A stationary Gaussian process is said to be long-range dependent (resp., anti-persistent) if its spectral density f(λ)f(\lambda) can be written as f(λ)=∣λ∣−2dg(∣λ∣)f(\lambda)=|\lambda|^{-2d}g(|\lambda|), where 0<d<1/20<d<1/2 (resp., −1/2<d<0-1/2<d<0), and gg is continuous and positive. We propose a novel Bayesian nonparametric approach for the estimation of the spectral density of such processes. We prove posterior consistency for both dd and gg, under appropriate conditions on the prior distribution. We establish the rate of convergence for a general class of priors and apply our results to the family of fractionally exponential priors. Our approach is based on the true likelihood and does not resort to Whittle's approximation.Comment: Published in at http://dx.doi.org/10.1214/11-AOS955 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Negative association, ordering and convergence of resampling methods

    Get PDF
    We study convergence and convergence rates for resampling schemes. Our first main result is a general consistency theorem based on the notion of negative association, which is applied to establish the almost-sure weak convergence of measures output from Kitagawa's (1996) stratified resampling method. Carpenter et al's (1999) systematic resampling method is similar in structure but can fail to converge depending on the order of the input samples. We introduce a new resampling algorithm based on a stochastic rounding technique of Srinivasan (2001), which shares some attractive properties of systematic resampling, but which exhibits negative association and therefore converges irrespective of the order of the input samples. We confirm a conjecture made by Kitagawa (1996) that ordering input samples by their states in R\mathbb{R} yields a faster rate of convergence; we establish that when particles are ordered using the Hilbert curve in Rd\mathbb{R}^d, the variance of the resampling error is O(N−(1+1/d)){\scriptscriptstyle\mathcal{O}}(N^{-(1+1/d)}) under mild conditions, where NN is the number of particles. We use these results to establish asymptotic properties of particle algorithms based on resampling schemes that differ from multinomial resampling.Comment: 54 pages, including 30 pages of supplementary materials (a typo in Algorithm 1 has been corrected

    Computational aspects of Bayesian spectral density estimation

    Full text link
    Gaussian time-series models are often specified through their spectral density. Such models present several computational challenges, in particular because of the non-sparse nature of the covariance matrix. We derive a fast approximation of the likelihood for such models. We propose to sample from the approximate posterior (that is, the prior times the approximate likelihood), and then to recover the exact posterior through importance sampling. We show that the variance of the importance sampling weights vanishes as the sample size goes to infinity. We explain why the approximate posterior may typically multi-modal, and we derive a Sequential Monte Carlo sampler based on an annealing sequence in order to sample from that target distribution. Performance of the overall approach is evaluated on simulated and real datasets. In addition, for one real world dataset, we provide some numerical evidence that a Bayesian approach to semi-parametric estimation of spectral density may provide more reasonable results than its Frequentist counter-parts

    Free Energy Methods for Bayesian Inference: Efficient Exploration of Univariate Gaussian Mixture Posteriors

    Full text link
    Because of their multimodality, mixture posterior distributions are difficult to sample with standard Markov chain Monte Carlo (MCMC) methods. We propose a strategy to enhance the sampling of MCMC in this context, using a biasing procedure which originates from computational Statistical Physics. The principle is first to choose a "reaction coordinate", that is, a "direction" in which the target distribution is multimodal. In a second step, the marginal log-density of the reaction coordinate with respect to the posterior distribution is estimated; minus this quantity is called "free energy" in the computational Statistical Physics literature. To this end, we use adaptive biasing Markov chain algorithms which adapt their targeted invariant distribution on the fly, in order to overcome sampling barriers along the chosen reaction coordinate. Finally, we perform an importance sampling step in order to remove the bias and recover the true posterior. The efficiency factor of the importance sampling step can easily be estimated \emph{a priori} once the bias is known, and appears to be rather large for the test cases we considered. A crucial point is the choice of the reaction coordinate. One standard choice (used for example in the classical Wang-Landau algorithm) is minus the log-posterior density. We discuss other choices. We show in particular that the hyper-parameter that determines the order of magnitude of the variance of each component is both a convenient and an efficient reaction coordinate. We also show how to adapt the method to compute the evidence (marginal likelihood) of a mixture model. We illustrate our approach by analyzing two real data sets

    Dynamic detection of change points in long time series

    Get PDF
    We consider the problem of detecting change points (structural changes) in long sequences of data, whether in a sequential fashion or not, and without assuming prior knowledge of the number of these change points. We reformulate this problem as the Bayesian filtering and smoothing of a non standard state-space model. Towards this goal, we build a hybrid algorithm that relies on particle filtering and MCMC ideas. The approach is illustrated by a GARCH change point model
    • …
    corecore